European Journal of Epidemiology
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match European Journal of Epidemiology's content profile, based on 40 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Franzese, F.; Bergmann, M.; Burzynska, A.
Show abstract
Socioeconomic inequalities in health and well-being are a major public health concern, particularly in ageing populations. Education is a key determinant shaping multiple aspects of health outcomes. We used cross-sectional data from wave 9 of the German sample (n=4,148) of the Survey of Health, Ageing and Retirement in Europe (SHARE) to test whether formal education is associated with well-being in later adulthood, with health literacy, self-rated health, and preventive health behaviours as possible mediators. Our results showed that education was positively associated with greater well-being, but only via indirect pathways. Specifically, self-rated health, health literacy, and fruit and vegetable consumption mediated the relationship between education and well-being accounting for 54.7, 24.7, and 12.6 percent of the total effect, respectively. In addition, there were significant positive correlations between education and health literacy, as well as high-intensity physical activity, daily fruit and vegetable consumption, more preventive health check-ups, and less smoking. In contrast, alcohol consumption was more common among those with higher levels of education. All health behaviours and health literacy were correlated directly or indirectly (i.e., mediated by health) with well-being. These findings highlight the importance of examining indirect pathways linking education to well-being in later life. Interventions aimed at improving health literacy and promoting healthy behaviours may help reduce educational inequalities in quality of life among older adults.
Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.
Show abstract
Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.
ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.
Show abstract
When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.
Qi, X.; Qi, H.; li, N.; Wang, T.; Wang, W.; Song, X.; Mi, B.; Zhang, D.
Show abstract
ABSTRACT Background and aims: Mental and behavioral disorders due to use of tobacco (MBDT) present a critical challenge to global health, yet modifiable lifestyle factors for reducing its risk remain poorly understood. Given that dietary fibre can affect mental health through gut-brain communication, we sought to explore how fibre intake relates to MBDT risks in smokers. Methods: We specifically evaluated the link between dietary fibre intake and MBDT within a smoking population. Utilizing the UK Biobank (UKB) database, we performed cross-sectional (N=19,943) and prospective cohort (N=19,885) evaluations applying logistic and Cox proportional hazards models, respectively. To determine potential causality, two-sample Mendelian randomization (MR) was applied, relying on GWAS summary data derived from the IEU Open GWAS Project and FinnGen repositories. Results: Cross-sectional findings indicated that individuals in the top quartile (Q4) of fibre intake exhibited decreased MBDT risks relative to the bottom quartile (Q1) (OR: 0.32, 95% CI: 0.13-0.79). Over a median observation time of 12.84 years, the prospective evaluation demonstrated a notable inverse correlation (Q4 HR: 0.46, 95% CI: 0.40-0.54). Non-linear modeling via restricted cubic splines uncovered an L-shaped dose-response curve. Furthermore, MR results confirmed a genetically predicted protective causality (IVW OR: 0.68, 95% CI: 0.49-0.95), which remained consistent across sensitivity validations. Conclusions: Among smokers, higher dietary fibre intake is robustly associated with a reduced risk of mental and behavioral disorders due to the use of tobacco, offering a modifiable dietary target for public health interventions.
Makinen, V.-P.; Kahonen, M.; Lehtimaki, T.; Hutri, N.; Ronnemaa, T.; Viikari, J.; Pahkala, K.; Rovio, S.; Niinikoski, H.; Mykkanen, J.; Raitakari, O.; Ala-Korpela, M.
Show abstract
Background and aims: Direct evidence to connect early life metabolism with cardiometabolic diseases in old age is limited due to the rarity of multi-decadal biochemical follow-up studies. To gain deeper insight into metabolic ageing, we conducted a longitudinal study that integrates serial data on clinical biomarkers, metabolomics and clinical events across the human life course. Methods: Children born in 1962-1992 were included from four European cohorts. Time-series of clinical biomarkers and metabolomics data were available for 8,653 participants (ages 0-49 years, 142 molecular and four physiological variables). Comparable data for 13,795 UK Biobank participants at two visits (ages 40-79 years) were linked with retrospective and prospective records of diabetes and cardiovascular disease. Lifetime metabolic trajectories were reconstructed by unsupervised machine learning and local polynomial regression. Results: A stable stratification in metabolic health emerged in children between ages 3 and 12 years and persisted to old age. We summarized this population pattern by assigning each participant into one of seven metabolic subgroups with characteristic biomarker trajectories. Two subgroups (MetDys TG+ and MetDys TG-) featured increased waist-height ratio from childhood, persistently higher C-reactive protein throughout life and rapidly increasing fasting insulin between 30 and 49 years of age. Both subgroups exhibited high risk for diabetes (HR > 13) and ischemic heart disease (HR > 2.5) when compared against the lowest risk subgroup (High HDL ApoB-). Conclusions: This life-course analysis shows that metabolic dysfunction associated with excess weight gain begins in early childhood and is associated with cardiometabolic morbidity in later life.
Tampubolon, G.
Show abstract
Population ageing increases the importance of cognitive capacity for making decisions about retirement and living independently beyond it. We tested whether post-war educational expansion and working-life social mobility eliminate the association between social class of origin and cognition in early old age using the 1958 National Child Development Study. Two outcomes were analysed at age 62: standard episodic memory (immediate + delayed word recall) and long-term episodic memory, capturing accurate half-century recall of childhood household facts (rooms and people at age 11 validated against mothers' responses). Social mobility trajectories derived in prior work were classified into predominantly manual versus non-manual class trajectories. Models were estimated separately for women and men across three specifications: (i) social origin and controls, (ii) adding social mobility, and (iii) adding weighting to address healthy survivor bias. Education was consistently associated with both outcomes. For long-term episodic memory, social origin gradients were clearer than for short-term episodic memory, with men from service/professional origins showing a 13 percentage-point higher probability of accurate half-century recall than men from manual origins. These findings indicate that education expansion and working-life social mobility failed to release the grip of social origin on long-term episodic memory.
Goncalves, B. P.; Franco, E. L.
Show abstract
Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.
Yang, S.; Grilli, M. D.; Wootton, R. E.; van de Weijer, M. P.; Treur, J. L.; Klimentidis, Y. C.; Sbarra, D. A.
Show abstract
Age-related hearing loss is linked to loneliness and poorer cognitive health, but it remains unclear whether loneliness helps explain associations between hearing difficulties and cognitive performance or dementia, and whether these patterns reflect causal pathways or shared underlying liability. In this preregistered study, we triangulated analyses across multiple data sources spanning approximately 18 years of observational data with 8 sources of molecular genetic information to examine whether loneliness helps explain the association between hearing difficulty and cognitive performance, Alzheimer's disease dementia, and all-cause dementia, and whether hearing-aid use may buffer this association. In longitudinal parallel-process latent growth curve models (N = 10,375) using nine waves of longitudinal data from the Survey of Health, Ageing and Retirement in Europe (SHARE), poorer hearing was associated with greater loneliness, and greater loneliness was associated with poorer cognitive performance, consistent with partial mediation. In contrast, worsening hearing over time was not clearly associated with increasing loneliness over time. Cumulative hearing-aid use did not appear to alter long-term loneliness trajectories, although current hearing-aid use weakened the concurrent association between poorer hearing and greater loneliness. In genetic analyses, we found little evidence that hearing phenotypes or loneliness had clear total or indirect effects on Alzheimer's disease dementia or all-cause dementia. Analyses accounting for shared genetic liability with neuroticism provided some evidence linking loneliness with poorer cognitive performance, and colocalization analyses further suggested shared genetic architecture across hearing, loneliness, cognition, and neuroticism-related traits. Overall, the findings support a robust cross-domain association between poorer hearing, greater loneliness, and poorer cognitive performance, while suggesting that long-term change and genetic evidence are more consistent with shared liability than with a single causal pathway.
Jones, L. V.; Barnett, A.; Hartel, G.; Vagenas, D.
Show abstract
Background: Reproducibility concerns in health research have grown, as many published results fail to be independently reproduced. Achieving computational reproducibility, where others can replicate the same results using the same methods, requires transparent reporting of statistical tests, models, and software use. While data-sharing initiatives have improved accessibility, the actual usability of shared data for reproducing research findings remains underexplored. Addressing this gap is crucial for advancing open science and ensuring that shared data meaningfully support reproducibility and enable collaboration, thereby strengthening evidence-based policy and practice. Methods: A random sample of 95 PLOS ONE health research papers from 2019 reporting linear regression was assessed for data-sharing practices and computational reproducibility. Data were accessible for 43 papers. From the randomly selected sample, the first 20 papers with available data were assessed for computational reproducibility. Three regression models per paper were reanalysed. Results: Of the 95 papers, 68 reported having data available, but 25 of these lacked the data required to reproduce the linear regression models. Only eight of 20 papers we analysed were computationally reproducible. A major barrier to reproducing the analyses was the great difficulty in matching the variables described in the paper to those in the data. Papers sometimes failed to be reproduced because the methods were not adequately described, including variable adjustments and data exclusions. Conclusion: More than half (60%) of analysed studies were not computationally reproducible, raising concerns about the credibility of the reported results and highlighting the need for greater transparency and rigour in research reporting. When data are made available, authors should provide a corresponding data dictionary with variable labels that match those used in the paper. Analysis code, model specifications, and any supporting materials detailing the steps required to reproduce the results should be deposited in a publicly accessible repository or included as supplementary files. To increase the reproducibility of statistical results, we propose a Model Location and Specification Table (MLast), which tracks where and what analyses were performed. In conjunction with a data dictionary, MLast enables the mapping of analyses, greatly aiding computational reproducibility.
O'Connor, M.; O'Connor, E.; Hughes, E. K.; Bann, D.; Knight, K.; Tabor, E.; Bridger-Staatz, C.; Gray, S.; Burgner, D.; Olsson, C. A.
Show abstract
Background: Population-based cohort studies are increasingly expected to demonstrate benefits for public health and wider society. However, there is limited systematic evidence on what such impact entails or how it is generated and sustained. To address this gap, we examined researcher perspectives on the impact of cohort studies. Methods: We conducted, to our knowledge, the first quantitative study of researcher views on cohort impact, recruiting active cohort researchers through national and international networks between August and December 2025. The anonymous cross-sectional survey captured researcher characteristics, perceived contributions, impact processes, challenges, and open-ended reflections. Results: A total of 163 cohort researchers participated, primarily from Australia (42%) and the UK (23%). Participants perceived their work as informing a wide range of societal issues and reported investing an average of 24% of their work time in impact-related activities. While most respondents (73%) believed their research leads to tangible policy or practice change, two thirds indicated that impact is rarely or never demonstrable shortly after study completion (67%) and seldom attributable to a single study (67%). Key concerns included pressure to overstate contributions (80%), perceived disadvantages for cohort studies in impact assessments (78%), and inadequate skills or resources to achieve impact (65%). Conclusions: Cohort researchers perceive their work as generating broad societal contributions and invest substantial effort in supporting impact. However, they face systemic challenges in both achieving and demonstrating impact. These findings highlight the need for impact frameworks that better capture complexity, long-term influence, and cumulative contributions, while mitigating unintended consequences.
Pae, B. J.; Li, L.; Wood, K.; Soliman, E. Z.; Chen, L. Y.; Norby, F. L.; Windham, B. G.; Alonso, A.
Show abstract
Background Poor physical function has been associated with higher cardiovascular disease (CVD) risk. However, the association between physical function and atrial fibrillation (AF) remains understudied. The comprehensive investigation of the association between physical function and incident AF risk could highlight a novel target for AF prevention. Methods A total of 4,803 participants without diagnosed AF from the Atherosclerosis Risk in Communities (ARIC) Study cohort with physical function assessed in 2011-2013 were studied. Physical function was measured using Short Physical Performance Battery (SPPB), 4-meter walk time, and grip strength. Hospital discharge codes and death certificates were used to ascertain incident AF through 2022, and through 2020 for participants from Jackson. Cox regression was used to assess the association between physical function and incident AF risk, adjusting for multiple covariates. Z-score transformations were performed to identify the physical function measure most strongly associated with incident AF risk, and SPPB component analysis was performed to identify the most influential SPPB component. Results Mean age of the study participants was 75.1 {+/-} 5.0 years, with 41.2% being male participants and 22.2% being black participants. During a median follow-up of 9.2 years, there were 809 incident AF events. SPPB (HR: 0.93, 95% CI: 0.90-0.96, per 1-point increase) and grip strength (HR: 0.87, 95% CI: 0.78-0.96, per 10kg increase) were inversely associated with incident AF risk, while 4-meter walk time (HR: 1.08, 95% CI: 1.03-1.13, per 1-second increase) was positively associated with incident AF risk. SPPB had the strongest association with incident AF risk. Within SPPB, only the chair stand component was significantly associated with incident AF risk. Conclusions The findings suggest that better physical function is associated with reduced incident AF risk, with higher SPPB having the strongest association. Given the modifiable nature of physical function, these findings highlight a potential novel target for AF prevention in aging populations.
Bui, L. V.; Nguyen, D. N.
Show abstract
Background. Vietnam's disease burden has shifted from communicable, maternal, neonatal, and nutritional (CMNN) causes to non-communicable diseases (NCDs), but the tempo, drivers, and regional positioning of this transition have not been jointly quantified. We characterised Vietnam's epidemiological transition 1990-2023 against ten Southeast-Asian (SEA) peers. Methods. Using Global Burden of Disease 2023 data, we computed joinpoint-regression AAPC with 95% CI (BIC-penalised, up to three break-points) for age-standardised DALY rates and cause-composition shares. We applied Das Gupta three-factor decomposition to 1990-2023 absolute DALY change (population-size, age-structure, age-specific-rate effects) and benchmarked Vietnam's NCD share against an SDI-conditional peer trajectory via leave-one-out quadratic regression. Premature mortality was quantified as WHO 30q70 under both broad NCD and strict SDG 3.4.1 definitions, using Chiang II life-table adjustment identically across all eleven countries. Findings. The CMNN age-standardised DALY rate fell from 13,295.9 to 4,022.1 per 100,000 (AAPC -4.63%/year; 95% CI -4.80 to -4.46); the NCD rate fell only from 21,688.2 to 19,282.8 (AAPC -0.37; -0.45 to -0.30). NCD share of total DALYs rose from 52.99% to 70.67% (+17.67 pp; AAPC +1.09). Vietnam ranked fourth of eleven SEA countries in 2023 (up from sixth in 1990) and sat 5.3% above the SDI-expected trajectory. Das Gupta decomposition attributed the +10.63 million NCD DALY increase to population growth (+6.26 M) and ageing (+6.08 M); rate change removed only 1.71 M. Premature NCD mortality fell from 25.02% to 21.80% (broad, 12.9% reduction) and from 22.17% to 19.50% (SDG 3.4.1, 12.0%; Vietnam sixth of eleven) - far short of the SDG 3.4 one-third-reduction target. Interpretation. Vietnam has entered a disability- and ageing-dominated NCD phase. Meeting SDG 3.4 by 2030 requires population-scale primary prevention sized to demographic momentum.
Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.
Show abstract
Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.
Schwendinger, F.; Infanger, D.; Rowlands, A.; Schmidt-Trucksäss, A.
Show abstract
Background: This prospective cohort analysis investigated how age, sex, and body morphology modify associations of physical activity (PA) intensity, duration, and volume with cardiovascular disease (CVD) mortality. Methods: We analysed wrist-worn accelerometer data from 8,661 adults (51.9% women) in the National Health and Nutrition Examination Survey. The outcome was CVD mortality. PA intensity and volume were quantified using the intensity gradient and average acceleration, respectively. Survey-weighted Cox proportional hazards models were used to estimate associations, including interaction terms with age, sex, or body morphology (waist-to-height ratio as indicator of adiposity). Results: Median (interquartile range) follow-up was 81 (69, 94) months. All hazard ratios (HR) compare 50th with 25th percentile. Beneficial associations between CVD mortality and PA were stronger in younger than older adults for intensity (e.g., 45-year-olds: HR=0.47, 95%CI:0.29-0.75 vs 75-year-olds: HR=0.75, 95%CI:0.54-1.06), and volume (e.g., HR=0.18, 95%CI:0.07-0.71 vs 0.29, 95%CI:0.16-0.51). In women, intensity-related association were stronger than in men (HR=0.45, 95%CI:0.31-0.65 vs HR=0.79, 95%CI:0.50-1.24). Volume-related associations were stronger in men (HR=0.37, 95%CI:0.22-0.60 vs HR=0.24, 95%CI:0.11-0.51), though with earlier plateauing and greater uncertainty. Associations were observed across waist-to-height ratio levels but attenuated at higher values (intensity: waist-to-height ratio 0.5, HR=0.45, 95%CI:0.29-0.69 vs 0.6, HR=0.69, 95%CI:0.49-0.97; volume: 0.5, HR=0.07, 95%CI:0.03-0.17 vs 0.6, HR=0.28, 95%CI:0.17-0.45). Conclusion: Older adults and men may benefit more from increasing PA volume than intensity, whereas younger adults and women may benefit more from higher-intensity PA. Although benefits were observed across adiposity levels, associations were attenuated as adiposity increased, suggesting stronger benefits in individuals with low-to-moderate adiposity.
Pinto, T. F.; Santoro, A.; Oliveira, A. L. G.; Tavares, T. S.; Almeida, A.; Incardona, F.; Marchetti, G.; Cozzi-Lepri, A.; Pinto, J.; Caporali, J. F. M.
Show abstract
Background: How post-COVID-19 condition (PCC) differs from post-acute infection syndromes (PAIS) caused by other respiratory viruses remains uncertain. Comparing these conditions may clarify whether post-acute symptoms reflect specific consequences of SARS-CoV-2 infection or broader post-viral mechanisms. Methods: We conducted a systematic review and meta-analysis of cohort studies comparing persistent symptoms or conditions in adults after SARS-CoV-2 infection with those following other acute respiratory viral infections. PubMed, Embase, and Scopus were searched. Random-effects models were used to estimate pooled risks. Results: Among 9,371 records screened, 22 studies were included and 14 contributed to the meta-analysis. Increased risk after SARS-CoV-2 infection was observed for pulmonary embolism, abnormal breathing, fatigue, hemorrhagic stroke, memory loss/brain fog, and palpitations; heart rate abnormalities showed borderline significance. For most other outcomes pooled estimates were inconclusive. Conclusions: Only a subset of outcomes appears more frequent after SARS-CoV-2 infection, suggesting many symptoms attributed to PCC may reflect broader post-viral syndromes.
Sztaniszlav, A.; Bjorkenheim, A.; Magnuson, A.; Edvardsson, N.; Poci, D.
Show abstract
Background: Socioeconomic factors impact cardiovascular health. We investigated the association between patient education level and incident heart failure (HF), acute myocardial infarction (AMI), and stroke following a first hospitalization with atrial fibrillation (AF). Methods: In this nationwide retrospective cohort study using linked Swedish national registers, we included all patients receiving a diagnosis of AF while hospitalized in Sweden from 1995 through 2008; categorized education level as primary, secondary, or academic; and followed patients for up to five years. Outcomes were first hospitalization for HF, AMI, or stroke. Associations were assessed using sex-stratified Cox proportional hazards models adjusted for age, calendar year of AF diagnosis, and measures of comorbidity burden (Charlson Comorbidity Index) and thromboembolic risk (CHA2DS2VA score). Results: The cohort comprised 263,172 patients (mean age 72.5 {+/-} 10.4 years; 56.2% male). Compared with primary education, secondary and academic education attainment were associated with lower adjusted risk of HF and AMI in both females and males. For HF, adjusted hazard ratios (HR) were 0.96 (95% CI 0.93 - 1.00) for secondary and 0.82 (95% CI 0.77 - 0.87) for academic education for females and 0.93 (95% CI 0.90 - 0.96) and 0.76 (95% CI 0.72 - 0.80), respectively, for males. For AMI, adjusted HRs were 0.89 (95% CI 0.85 - 0.93) and 0.71 (95% CI 0.65 - 0.78) for females and 0.91 (95% CI 0.87 - 0.94) and 0.75 (95% CI 0.71 - 0.80) for males. For stroke, lower adjusted risk was observed only in the academic education group. Baseline comorbidity burden and thromboembolic risk were higher in lower education groups. Conclusions: Education level was inversely associated with risk of incident HF and AMI over five years, while the association with stroke risk was weaker. Documenting education level may help identify patients at increased risk who could benefit from careful monitoring and optimized preventive care.
Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.
Show abstract
Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.
Jones, L.; Barnett, A.; Hartel, G.; Vagenas, D.
Show abstract
Background: In health research, variability in modelling decisions can lead to different conclusions even when the same data are analysed, a challenge known as inferential reproducibility. In linear regression analyses, incorrect handling of key assumptions, such as normality of the residuals and linearity, can undermine reproducibility. This study examines how violations of these assumptions influence inferential conclusions when the same data are reanalysed. Methods: We randomly sampled 95 health-related PLOS ONE papers from 2019 that reported linear regression in their methods. Data were available for 43 papers, and 20 were assessed for computational reproducibility, with three models per paper evaluated. The 14 papers that included a model at least partially computationally reproduced were then examined for inferential reproducibility. To assess the impact of assumption violations, differences in coefficients, 95% confidence intervals, and model fit were compared. Results: Of the fourteen papers assessed, only three were inferentially reproducible. The most frequently violated assumptions were normality and independence, each occurring in eight papers. Violations of independence were particularly consequential and were commonly associated with inferential failure. Although reproduced analyses often retained the same binary statistical significance classification as the original studies, confidence intervals were frequently wider, indicating greater uncertainty and reduced precision. Such uncertainty may affect the interpretation of results and, in turn, influence treatment decisions and clinical practice. Conclusion: Our findings demonstrate that substantial violations of key modelling assumptions often went undetected by authors and peer reviewers and, in many cases, were associated with inferential reproducibility failure. This highlights the need for stronger statistical education and greater transparency in modelling decisions. Rather than applying rigid or misinformed rules, such as incorrectly testing the normality of the outcome variable, researchers should adopt modelling frameworks guided by the research question and the study design. When assumptions are violated, appropriate alternatives, such as robust methods, bootstrapping, generalized linear models, or mixed-effects models, should be considered. Given that assumption violations were common even in relatively simple regression models, early and sustained collaboration with statisticians is critical for supporting robust, defensible, and clinically meaningful conclusions.
Schmidt, C.; Samartsidis, P.; Seaman, S.; Emmanouil, B.; Foster, G.; Reid, L.; Smith, S.; De Angelis, D.
Show abstract
To minimise health disparities, equitable access to medical treatment is paramount. In a pioneering intervention, National Health Service Englands Hepatitis C virus (HCV) programme has implemented country-wide peer support to boost treatment access. Peer support workers (peers) are individuals with relevant lived experience, who promote testing and treatment in marginalised populations underserved by traditional health services. We evaluated the English peers intervention, exploiting its staggered rollout and rich surveillance data between June 2016 and May 2021. Peers increased HCV cases identified by 13{middle dot}9% (95% credible interval (95% CrI) [5{middle dot}3, 21{middle dot}7]), sustained viral responses by 8{middle dot}0% (95% CrI [-4{middle dot}4, 18{middle dot}6]), and drug services referrals by 8{middle dot}8% (95% CrI [-12{middle dot}5, 22{middle dot}6]). The interventions effectiveness was magnified during the first COVID-19 lockdown and individuals supported by peers typically belonged to populations with poor treatment access. Our findings indicate that peers can boost equity in treatment access on a national scale.
Wei, M.; Yadlapati, L.; Peng, Q.
Show abstract
Background: The Adolescent Brain Cognitive Development (ABCD) Study provides rich longitudinal data on environmental, genetic, and behavioral factors related to substance use initiation. Classical marginal structural models (MSMs) require selecting covariates for propensity models, which is challenging when there are many correlated predictors. Methods: We analyzed longitudinal panel data from 11,868 ABCD participants with repeated observations over time. Interval-level binary outcomes were defined for initiation of alcohol, nicotine, cannabis, and any substance, including only participants at risk before initiation. All predictors were constructed as lagged variables to preserve temporal ordering. We used a two-stage machine learning-based causal framework. First, we performed graph discovery using a Granger-inspired lagged predictive modeling approach with elastic-net logistic regression to identify relationships between past predictors and future outcomes. Stable candidate edges were selected using subject-level bootstrap stability selection. Second, we estimated adjusted effects for stable predictors using double machine learning (DML) with partialling-out and cross-fitting. For each predictor, the lagged variable was treated as the exposure and adjusted for high-dimensional lagged covariates. Cross-fitting with group-based splitting accounted for within-subject dependence. Nuisance functions were estimated using random forests, and cluster-robust standard errors were used for inference. Results: We identified stable predictors across multiple domains, including sleep patterns, family environment, peer relationships, behavioral traits, and genetic risk. Many predictors were shared across substance outcomes, while some were outcome-specific. Effect sizes were modest, typically ranging from -0.01 to 0.02 per standard deviation increase in the predictor. Both risk-increasing and protective associations were observed. Risk factors included sleep disturbance and behavioral risk indicators, while protective factors included parental monitoring and structured environments. Conclusions: This study presents a practical framework for analyzing high-dimensional longitudinal data and identifying time-varying predictors of substance use initiation. The approach combines machine learning for variable selection with causal inference for effect estimation. The results highlight both shared and outcome-specific risk factors and identify modifiable targets, such as family environment and sleep, that may inform prevention strategies.